Crowd Labeling: a survey
نویسندگان
چکیده
Crowd computing empowers computer systems by utilizing humans’ perception, and their ability to solve non-algorithmic problems. In this approach, a group of humans are asked to contributively solve a problem that cannot be solved easily by individuals, or perfectly by computers. However, there are complexities in using humans to solve problems. Lack of generative models, complex cost models, lower speed in comparison to computers, limitation of knowledge and skills, noise, bias and error are examples of such complexities. An optimized crowd computing system should overcome these complexities, and improve the quality of solutions. This paper includes answers to three main questions: What is crowd computing? Why should one use crowd computing? And, how to use crowd computing? We will briefly answer the two former questions, while we will focus more on the latter one, specially on solving classification problems using multiple checking scenario. In addition, we will compare the current methods of crowed computing, and provide some guidelines for future works based on the current open issues in this field.
منابع مشابه
Toward a Robust Crowd-labeling Framework using Expert Evaluation and Pairwise Comparison
Crowd-labeling emerged from the need to label large-scale and complex data, a tedious, expensive, and time-consuming task. One of the main challenges in the crowd-labeling task is to control for or determine in advance the proportion of low-quality/malicious labelers. If that proportion grows too high, there is often a phase transition leading to a steep, non-linear drop in labeling accuracy as...
متن کاملQuality Control of Crowd Labeling through Expert Evaluation
We propose a general scheme for quality-controlled labeling of large-scale data using multiple labels from the crowd and a “few” ground truth labels from an expert of the field. Expert-labeled instances are used to assign weights to the expertise of each crowd labeler and to the difficulty of each instance. Ground truth labels for all instances are then approximated through those weights along ...
متن کاملSembler: Ensembling Crowd Sequential Labeling for Improved Quality
Many natural language processing tasks, such as named entity recognition (NER), part of speech (POS) tagging, word segmentation, and etc., can be formulated as sequential data labeling problems. Building a sound labeler requires very large number of correctly labeled training examples, which may not always be possible. On the other hand, crowdsourcing provides an inexpensive yet efficient alter...
متن کاملRobust Crowd Labeling Using Little Expertise
Crowd-labeling emerged from the need to label large-scale and complex data, a tedious, expensive, and time-consuming task. But the problem of obtaining good quality labels from a crowd and their integration is still unresolved. To address this challenge, we propose a new framework that automatically combines and boosts bulk crowd labels supported by limited number of “ground truth” labels from ...
متن کاملSpeeding up Crowds for Low-latency Data Labeling
Data labeling is a necessary but often slow process that impedes the development of interactive systems for modern data analysis. Despite rising demand for manual data labeling, there is a surprising lack of work addressing its high and unpredictable latency. In this paper, we introduce CLAMShell, a system that speeds up crowds in order to achieve consistently low-latency data labeling. We o↵er...
متن کامل